Search results for "Determining the number of clusters in a data set"

showing 9 items of 9 documents

Quantum clustering in non-spherical data distributions: Finding a suitable number of clusters

2017

Quantum Clustering (QC) provides an alternative approach to clustering algorithms, several of which are based on geometric relationships between data points. Instead, QC makes use of quantum mechanics concepts to find structures (clusters) in data sets by finding the minima of a quantum potential. The starting point of QC is a Parzen estimator with a fixed length scale, which significantly affects the final cluster allocation. This dependence on an adjustable parameter is common to other methods. We propose a framework to find suitable values of the length parameter σ by optimising twin measures of cluster separation and consistency for a given cluster number. This is an extension of the Se…

0301 basic medicineClustering high-dimensional dataMathematical optimizationCognitive NeuroscienceSingle-linkage clusteringCorrelation clustering02 engineering and technologyComputer Science ApplicationsHierarchical clusteringDetermining the number of clusters in a data set03 medical and health sciences030104 developmental biologyArtificial Intelligence0202 electrical engineering electronic engineering information engineeringCluster (physics)020201 artificial intelligence & image processingQACluster analysisAlgorithmk-medians clusteringMathematicsNeurocomputing
researchProduct

SMART: Unique splitting-while-merging framework for gene clustering

2014

© 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number …

Clustering algorithmsMicroarrayslcsh:MedicineGene ExpressionBioinformaticscomputer.software_genreCell SignalingData MiningCluster Analysislcsh:ScienceFinite mixture modelOligonucleotide Array Sequence AnalysisPhysicsMultidisciplinarySMART frameworkConstrained clusteringCompetitive learning modelBioassays and Physiological AnalysisMultigene FamilyCanopy clustering algorithmEngineering and TechnologyData miningInformation TechnologyGenomic Signal ProcessingAlgorithmsResearch ArticleSignal TransductionComputer and Information SciencesFuzzy clusteringCorrelation clusteringResearch and Analysis MethodsClusteringMolecular GeneticsCURE data clustering algorithmGeneticsGene RegulationCluster analysista113Gene Expression Profilinglcsh:RBiology and Life SciencesComputational BiologyCell BiologyDetermining the number of clusters in a data setComputingMethodologies_PATTERNRECOGNITIONSplitting-merging awareness tactics (SMART)Signal ProcessingAffinity propagationlcsh:QGene expressionClustering frameworkcomputer
researchProduct

Bayesian versus data driven model selection for microarray data

2014

Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. A…

Clustering Model selection Bayesian information criterion Akaike information criterion Minimum message length BioinformaticsSettore INF/01 - InformaticaComputer sciencebusiness.industryModel selectionBayesian probabilitycomputer.software_genreMachine learningComputer Science ApplicationsData-drivenDetermining the number of clusters in a data setIdentification (information)Bayesian information criterionData miningArtificial intelligenceAkaike information criterionCluster analysisbusinesscomputer
researchProduct

Clustering categorical data: A stability analysis framework

2011

Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm is a popular choice, but K-means is not generally appropriate for categorical data. A specific extension of k-means for categorical data is the k-modes algorithm. Both of these partition clustering methods are sensitive to the initialization of prototypes, which creates the difficulty of selecting the best solution for a given problem. In addition, selecting the number of clusters can be an issue. Further, the k-modes method is especially prone to instability when presented with ‘noisy’ data, since the calculation of the mode lacks the smoothing effect inherent in the calculation …

Computer sciencebusiness.industrySingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreMachine learningDetermining the number of clusters in a data setData stream clusteringCURE data clustering algorithmConsensus clusteringData miningArtificial intelligenceCluster analysisbusinesscomputer2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
researchProduct

Distance-constrained data clustering by combined k-means algorithms and opinion dynamics filters

2014

Data clustering algorithms represent mechanisms for partitioning huge arrays of multidimensional data into groups with small in–group and large out–group distances. Most of the existing algorithms fail when a lower bound for the distance among cluster centroids is specified, while this type of constraint can be of help in obtaining a better clustering. Traditional approaches require that the desired number of clusters are specified a priori, which requires either a subjective decision or global meta–information knowledge that is not easily obtainable. In this paper, an extension of the standard data clustering problem is addressed, including additional constraints on the cluster centroid di…

Fuzzy clusteringCorrelation clusteringSingle-linkage clusteringConstrained clusteringcomputer.software_genreDetermining the number of clusters in a data setSettore ING-INF/04 - AutomaticaData clustering k–means Opinion dynamics Hegelsmann–Krause modelCURE data clustering algorithmData miningCluster analysisAlgorithmcomputerk-medians clusteringMathematics22nd Mediterranean Conference on Control and Automation
researchProduct

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering

2017

Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…

Fuzzy clusteringlcsh:T55.4-60.8Computer scienceSingle-linkage clusteringCorrelation clustering02 engineering and technologycomputer.software_genrelcsh:QA75.5-76.95Theoretical Computer Scienceprototype-based clusteringCURE data clustering algorithm020204 information systemsprototype-based clustering; clustering validation index; robust statisticsConsensus clusteringalgoritmit0202 electrical engineering electronic engineering information engineeringlcsh:Industrial engineering. Management engineeringCluster analysisk-medians clusteringta113Numerical Analysisbusiness.industryPattern recognitionDetermining the number of clusters in a data setComputational MathematicsComputingMethodologies_PATTERNRECOGNITIONComputational Theory and Mathematicsrobust statistics020201 artificial intelligence & image processinglcsh:Electronic computers. Computer scienceArtificial intelligenceData miningtiedonlouhintabusinessclustering validation indexcomputerAlgorithms
researchProduct

Distributed Data Clustering via Opinion Dynamics

2015

We provide a distributed method to partition a large set of data in clusters, characterized by small in-group and large out-group distances. We assume a wireless sensors network in which each sensor is given a large set of data and the objective is to provide a way to group the sensors in homogeneous clusters by information type. In previous literature, the desired number of clusters must be specified a priori by the user. In our approach, the clusters are constrained to have centroids with a distance at least ε between them and the number of desired clusters is not specified. Although traditional algorithms fail to solve the problem with this constraint, it can help obtain a better cluste…

Theoretical computer scienceArticle SubjectComputer Networks and Communicationsbusiness.industryComputer scienceGeneral EngineeringConstrained clusteringPartition (database)lcsh:QA75.5-76.95NETWORKSDetermining the number of clusters in a data setConsensusSettore ING-INF/04 - AutomaticaCONSENSUS PROBLEMSWirelesslcsh:Electronic computers. Computer sciencebusinessCluster analysis
researchProduct

A novel heuristic memetic clustering algorithm

2013

In this paper we introduce a novel clustering algorithm based on the Memetic Algorithm meta-heuristic wherein clusters are iteratively evolved using a novel single operator employing a combination of heuristics. Several heuristics are described and employed for the three types of selections used in the operator. The algorithm was exhaustively tested on three benchmark problems and compared to a classical clustering algorithm (k-Medoids) using the same performance metrics. The results show that our clustering algorithm consistently provides better clustering solutions with less computational effort.

ta113Determining the number of clusters in a data setBiclusteringClustering high-dimensional dataDBSCANComputingMethodologies_PATTERNRECOGNITIONTheoretical computer scienceCURE data clustering algorithmCorrelation clusteringCanopy clustering algorithmCluster analysisAlgorithmMathematics2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
researchProduct

An efficient cluster-based outdoor user positioning using LTE and WLAN signal strengths

2015

In this paper we propose a novel cluster-based RF fingerprinting method for outdoor user-equipment (UE) positioning using both LTE and WLAN signals. It uses a simple cost effective agglomerative hierarchical clustering with Davies-Bouldin criterion to select the optimal cluster number. The positioning method does not require training signature formation prior to UE position estimation phase. It is capable of reducing the search space for clustering operation by using LTE cell-ID searching criteria. This enables the method to estimate UE positioning in short time with less computational expense. To validate the cluster-based positioning real-time field measurements were collected using readi…

ta113SIMPLE (military communications protocol)business.industryComputer scienceReal-time computingLTE cell-IDFingerprint recognitionGridminimization of drive testsDetermining the number of clusters in a data setEmbedded systemgrid-based RF fingerprintingRadio frequencybusinessCluster analysishierarchical clustering
researchProduct